{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c2e682f2-7707-4f58-9d3d-7dec60862d10",
   "metadata": {},
   "source": [
    "# LocalAI and OpenVINO\n",
    "\n",
    "[LocalAI](https://localai.io/) is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families and architectures. Does not require GPU. It is created and maintained by `Ettore Di Giacinto`.\n",
    "\n",
    "In this tutorial we show how to prepare a model config and launch an OpenVINO LLM model with LocalAI in docker container. \n",
    "\n",
    "#### Table of contents:\n",
    "\n",
    "- [Prepare Docker](#Prepare-Docker)\n",
    "- [Prepare a model](#Prepare-a-model)\n",
    "- [Run the server](#Run-the-server)\n",
    "- [Send a client request](#Send-a-client-request)\n",
    "- [Stop the server](#Stop-the-server)\n",
    "\n",
    "### Installation Instructions\n",
    "\n",
    "This is a self-contained example that relies solely on its own code.\n",
    "\n",
    "We recommend  running the notebook in a virtual environment. You only need a Jupyter server to start.\n",
    "For details, please refer to [Installation Guide](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/README.md#-installation-guide).\n",
    "\n",
    "<img referrerpolicy=\"no-referrer-when-downgrade\" src=\"https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/localai/localai.ipynb\" />\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f3778d25",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "from pathlib import Path\n",
    "\n",
    "if not Path(\"notebook_utils.py\").exists():\n",
    "    r = requests.get(\n",
    "        url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n",
    "    )\n",
    "    open(\"notebook_utils.py\", \"w\").write(r.text)\n",
    "\n",
    "# Read more about telemetry collection at https://github.com/openvinotoolkit/openvino_notebooks?tab=readme-ov-file#-telemetry\n",
    "from notebook_utils import collect_telemetry\n",
    "\n",
    "collect_telemetry(\"localai.ipynb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00710cc8-8574-4ff4-983f-cf844347f366",
   "metadata": {},
   "source": [
    "## Prepare Docker\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "Install [Docker Engine](https://docs.docker.com/engine/install/), including its [post-installation](https://docs.docker.com/engine/install/linux-postinstall/) steps, on your development system. To verify installation, test it, using the following command. When it is ready, it will display a test image and a message."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d5eafe31-c3f6-4fd7-9e0d-587c03431955",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Hello from Docker!\n",
      "This message shows that your installation appears to be working correctly.\n",
      "\n",
      "To generate this message, Docker took the following steps:\n",
      " 1. The Docker client contacted the Docker daemon.\n",
      " 2. The Docker daemon pulled the \"hello-world\" image from the Docker Hub.\n",
      "    (amd64)\n",
      " 3. The Docker daemon created a new container from that image which runs the\n",
      "    executable that produces the output you are currently reading.\n",
      " 4. The Docker daemon streamed that output to the Docker client, which sent it\n",
      "    to your terminal.\n",
      "\n",
      "To try something more ambitious, you can run an Ubuntu container with:\n",
      " $ docker run -it ubuntu bash\n",
      "\n",
      "Share images, automate workflows, and more with a free Docker ID:\n",
      " https://hub.docker.com/\n",
      "\n",
      "For more examples and ideas, visit:\n",
      " https://docs.docker.com/get-started/\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!docker run hello-world"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6c47cd8-a22b-4754-918e-177a5f91a70d",
   "metadata": {},
   "source": [
    "### Prepare a model\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "LocalAI allows to use customized models. For more details you can read the [instruction](https://localai.io/docs/getting-started/customize-model/) where you can also find the detailed documentation. We will use one of the OpenVINO optimized LLMs in the collection on the [collection on 🤗Hugging Face](https://huggingface.co/collections/OpenVINO/llm-6687aaa2abca3bbcec71a9bd). In this example we will use [TinyLlama-1.1B-Chat-v1.0-fp16-ov](https://huggingface.co/OpenVINO/TinyLlama-1.1B-Chat-v1.0-fp16-ov). First of all we should create a model configuration file:\n",
    "\n",
    "```YAML\n",
    "name: TinyLlama-1.1B-Chat-v1.0-fp16-ov\n",
    "backend: transformers\n",
    "parameters:\n",
    "  model: OpenVINO/TinyLlama-1.1B-Chat-v1.0-fp16-ov\n",
    "  temperature: 0.2\n",
    "  top_k: 40\n",
    "  top_p: 0.95\n",
    "  max_new_tokens: 32\n",
    "  \n",
    "type: OVModelForCausalLM\n",
    "\n",
    "template:\n",
    "  chat_message: |\n",
    "    <|im_start|>{{if eq .RoleName \"assistant\"}}assistant{{else if eq .RoleName \"system\"}}system{{else if eq .RoleName \"user\"}}user{{end}}\n",
    "    {{if .Content}}{{.Content}}{{end}}<|im_end|>\n",
    "  chat: |\n",
    "    {{.Input}}\n",
    "    <|im_start|>assistant\n",
    "    \n",
    "  completion: |\n",
    "    {{.Input}}\n",
    "\n",
    "stopwords:\n",
    "- <|im_end|>\n",
    "```\n",
    "The fields `backend`, `model`,  `type` you can find in the code example on the model page (we added the corresponding comments):\n",
    "```python\n",
    "from transformers import AutoTokenizer   # backend\n",
    "from optimum.intel.openvino import OVModelForCausalLM  # type\n",
    "\n",
    "model_id = \"OpenVINO/TinyLlama-1.1B-Chat-v1.0-fp16-ov\"  # parameters.model\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
    "model = OVModelForCausalLM.from_pretrained(model_id)\n",
    "```\n",
    "The name you can choose by yourself. By this name you will specify what model to use on the client side.\n",
    "\n",
    "\n",
    "You can create a GitHub gist and modify fields: [`ov.yaml`](https://gist.githubusercontent.com/aleksandr-mokrov/f007c8fa6036760a856ddc60f605a0b0/raw/9d24ceeb487f9c058a943113bd0290e8ae565b3e/ov.yaml)\n",
    "\n",
    "Description of the parameters used in config YAML file can be found [here](https://localai.io/advanced/#advanced-configuration-with-yaml-files).\n",
    "\n",
    "The most important:\n",
    "\n",
    "- `name` - model name, used to identify the model in API calls.\n",
    "- `backend` - backend to use for computation (like llama-cpp, diffusers, whisper, transformers).\n",
    "- `parameters.model` - relative to the models path.\n",
    "- `temperature`, `top_k`, `top_p`, `max_new_tokens` - parameters for the model.\n",
    "- `type` - type of configuration, often related to the type of task or model architecture.\n",
    "- `template` - templates for various types of model interactions.\n",
    "- `stopwords` - Words or phrases that halts processing.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99fc18ad-28d8-4225-9fc6-f083d5c4c5f5",
   "metadata": {},
   "source": [
    "### Run the server\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "Everything is ready for launch. Use `quay.io/go-skynet/local-ai:v2.23.0-ffmpeg` image that contains all required dependencies. For more details read [Run with container images](https://localai.io/basics/container/#standard-container-images).\n",
    "If you want to see the output remove the `-d` flag and send a client request from a separate notebook. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf840f51-fe93-4aad-acc8-48f76f82c3c1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "c9e6ca714193e9457c461751ae0a10933675c67a3f76b2e1855d6dae149b2bce\n"
     ]
    }
   ],
   "source": [
    "!docker run -d --rm --name=\"localai\" -p 8080:8080 quay.io/go-skynet/local-ai:master-sycl-f16-ffmpeg https://gist.githubusercontent.com/aleksandr-mokrov/f007c8fa6036760a856ddc60f605a0b0/raw/9d24ceeb487f9c058a943113bd0290e8ae565b3e/ov.yaml"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17972530-9c4c-4ab3-9eeb-02ace2676460",
   "metadata": {},
   "source": [
    "Check whether the `localai` container is running normally:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6b10cc0e-1272-4dc0-b666-29de071600b1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "c9e6ca714193   quay.io/go-skynet/local-ai:master-sycl-f16-ffmpeg   \"/build/entrypoint.s…\"   1 second ago   Up Less than a second (health: starting)   0.0.0.0:7860->8080/tcp, [::]:7860->8080/tcp   localai\n"
     ]
    }
   ],
   "source": [
    "!docker ps | grep localai"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b1a0a83-ec63-4934-929d-b096199e09e9",
   "metadata": {},
   "source": [
    "### Send a client request\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "Now you can send HTTP requests using the model name `TinyLlama-1.1B-Chat-v1.0-fp16-ov`. More details how to use [OpenAI API](https://platform.openai.com/docs/api-reference/chat)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0e0f6c3-2b6d-4f18-a32b-9b154a0ea858",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\"created\":1732756622,\"object\":\"text_completion\",\"id\":\"af66405e-1579-41a5-b18e-7ce5b4292a63\",\"model\":\"TinyLlama-1.1B-Chat-v1.0-fp16-ov\",\"choices\":[{\"index\":0,\"finish_reason\":\"stop\",\"text\":\"\\n\\nOpenVINO is a toolkit for Intel(r) OpenVINO(r) Toolkit, which is a toolkit for developing and deploying deep learning models on Intel(r) architecture. OpenVINO is designed to help developers implement deep learning models efficiently on Intel(r) architecture.\\n\\nThe main features of OpenVINO include:\\n\\n1. Develop and train deep learning models: OpenVINO provides a powerful toolset for developing and training deep learning models, including data augmentation, image resizing, and preprocessing.\\n\\n2. Compile and run models: OpenVINO provides a compilation and runtime environment for deploying trained models on Intel(r) architecture. Users can easily execute trained models on devices with Intel(r) architecture through the OpenVINO Runtime.\\n\\n3. Support for diverse deep learning frameworks: OpenVINO supports a wide range of deep learning frameworks, including TensorFlow, CNTK, and PyTorch, allowing users to choose the best framework for their project.\\n\\n4. Easy to use: OpenVINO is an open-source toolkit, which means it is easy to use and can be customized according to specific needs.\\n\\n5. Flexible deployment: OpenVINO provides a flexible deployment model, allowing users to deploy trained models on different types of devices, including CPU, GPU, and FPGA.\\n\\n6. Integration with OpenVINO tools: OpenVINO integrates with the OpenVINO Runtime, providing users with easy access to all the necessary components for developing, compiling, and running deep learning models.\\n\\nOverall, OpenVINO is a powerful toolkit for developing and deploying deep learning models on Intel(r) architecture, and it has many features that make it easy for developers to get started and integrate with various deep learning frameworks.\"}],\"usage\":{\"prompt_tokens\":0,\"completion_tokens\":0,\"total_tokens\":0}}"
     ]
    }
   ],
   "source": [
    "!curl http://localhost:8080/v1/completions -H \"Content-Type: application/json\" -d '{\"model\": \"TinyLlama-1.1B-Chat-v1.0-fp16-ov\", \"prompt\": \"What is OpenVINO?\"}'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ba81e6b-9304-4a6c-8d29-e991150586ea",
   "metadata": {},
   "source": [
    "### Stop the server\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "e13dfb1f-54e2-4b94-9eaf-c90f5853290c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "localai\n"
     ]
    }
   ],
   "source": [
    "!docker stop localai"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "openvino_notebooks": {
   "imageUrl": "https://github.com/user-attachments/assets/abd938b3-32c2-4e4e-b7b7-fb1a2f9315ce",
   "tags": {
    "categories": [
     "API Overview"
    ],
    "libraries": [],
    "other": [
     "LLM"
    ],
    "tasks": [
     "Text Generation"
    ]
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
